Introduction to bioacoustics

Bioacoustics is the study of how living organisms produce, transmit, and receive sounds. It can answer questions about behaviour, conservation interventions, population monitoring, and more. There are many programs to support different parts of bioacoustic workflows, but I want to feature some tools working within R. Let’s start at the beginning.

In this tutorial, I cover some basics, including loading, organizing, and displaying sound data. We’ll use the tuneR and monitoR packages. In the following tutorials, I’ll cover classifications (species recognizers) and soundscape indices.

Let’s get started.

Working with bioacoustic data generally falls into two broad workflows:

  1. Working with species detections: requires manual, automated, or mixed approaches to detecting single or multiple species in sound recordings to make inferences (e.g., questions about what affects whether a species, group of species, or community diversity is related to other variables).

  2. Working with soundscape indices: requires using sound data to calculate indices of biological or non-biological sound to make inferences (e.g., questions about noise pollution, or making inferences about community diversity based on general diversity of the soundscape).

For either, we need to have some sound data that we can load, examine, measure, and manipulate. Let’s run through some basics of loading bioacoustic data into our R environment. We’ll work with .wav files, which is a common format from both Autonomous Recording Units (ARU’s) and phones.

library(tuneR)
library(monitoR)
## 
## Attaching package: 'monitoR'
## The following object is masked from 'package:tuneR':
## 
##     readMP3
# We'll load a short recording of a Black-throated Green Warbler (btnw) vocalization, 
# which is included in the monitoR package
data(btnw)

# Look at contents
btnw
## 
## Wave Object
##  Number of Samples:      72001
##  Duration (seconds):     3
##  Samplingrate (Hertz):   24000
##  Channels (Mono/Stereo): Mono
##  PCM (integer format):   TRUE
##  Bit (8/16/24/32/64):    16

What does this information mean?

  • We loaded a focal recording that is a Wave Object: btnw
  • The number of samples: effectively the number of data points
  • Duration: the length in seconds of the recording
  • The sampling rate (Hertz): Related to quality of the data, this is the number of samples/time, which is the way a continuous signal is sampled into a discrete signal (Hertz/s)
  • Channels: Whether data is recorded with one (mono) or two (stereo) input channels. We’ll use examples of single channels here, but two channels can be helpful for questions that need to identify how many individuals are detected, similar to a point count for birds (i.e., a human listening in the field has two input channels).

Next, let’s visualize the contents of the recording. Spectrograms are the primary way to visualize sound data. They show time on the horizontal axis, frequency on the vertical axis, and the colour of each pixel shows the sound intensity. Species’ vocalizations tend to have specific patterns that correspond to the uniqueness of the sound; experience looking at spectrograms will help us recognize species just from the image pattern. Species’ classifiers (next post) are typically identifying patterns in the image representations of sounds.

viewSpec(btnw,
         ovlp = 90,
         spec.col = viridis::viridis(100))

Loading our own sound recordings with multiple signals

Often bioacoustic data will include many (and long) recordings with different species vocalizing at different times that we are interested in extracting information from. This information might occur at the level of vocalizations, which require manual, automated, or a combination of automated and manual identification.

Let’s explore using a recording I made at Oak Hammock Marsh (Manitoba, Canada) that includes multiple species.

# Load the WAV file using tuneR::readWave
marshSounds <- readWave("marshSounds_2025-06-02_1144.wav")

# View properties
marshSounds
## 
## Wave Object
##  Number of Samples:      2151360
##  Duration (seconds):     48.78
##  Samplingrate (Hertz):   44100
##  Channels (Mono/Stereo): Mono
##  PCM (integer format):   TRUE
##  Bit (8/16/24/32/64):    16

With longer recordings (even this one, which is less than a minute), the patterns become harder to see unless we zoom into specific parts of the spectrogram, which we can do by changing the start.time and page.length arguments. Below, I’ll zoom in on one part of marshSounds2, which more clearly shows the vocalization of a Clay-coloured Sparrow with three buzzy bursts centred around 6 kHz.

# View spec at specific times and shorter page length
viewSpec(marshSounds, 
         start.time = 14, # start at second 14
         page.length = 4, # 5 seconds
         ovlp = 90, # window overlap in the Fourier Transform (%)
         spec.col = viridis::viridis(100) # use a viridis colour scale
         )

The monitoR package contains convenience functions for annotating .wav files and also reading .csv annotations later in workflows. By adding the annotate = TRUE argument, we can interactively outline and label vocalizations within the Wave object. The results are saved as a .csv in the working directory, which can be named and later used for analyses. Below, I show the code for initiating annotation, and a screenshot of what an annotated file looks like with the bounding box around the Clay-coloured Sparrow vocalization (ccsp).

viewSpec(marshSounds, 
         start.time = 14,
         page.length = 5,
         spec.col = viridis::viridis(100),
         annotate = TRUE # can be used to draw bounding boxes and label vocalizations
         )

We can read in the annotations, which are saved as .csv files (TMPannotations.csv)

# Read annotations file
marshSoundsAnnotations <- read.csv("TMPannotations.csv")

# Annotation table has rows for detections with start/end time, min/max frequency, and name (label)
marshSoundsAnnotations
##   start.time end.time min.frq max.frq name
## 1     14.307   16.996  4.0651  8.4376 ccsp

There are many other programs for annotating sound data where the annotations can be read into R, including Raven (Rraven, though now archived), WildTrax (wildRtrax), and others.

Many of the tools for bioacoustic data analyses are in python instead of R. My goal is to share some of the functionality for bioacoustic analyses in R, but I acknowledge there are many feasible workflows and tools, depending on the scientific question, data, and collaborators involved.

Review and next steps

In this short post we:

  • Read acoustic data from .wav files as Wave objects.
  • Examined properties of the Wave objects, including duration and quality,
  • Displayed spectrograms of individual vocalizations and longer soundscapes.
  • Demonstrated how recordings could be annotated within R using monitoR, and then have results later used for analyses (e.g., on community differences between treatments or time).

In the next tutorial, we’ll use species classifiers to extract data from acoustic data, which can be used for analyses and human listening verification.